DNA sequence assembly and multiple sequence alignment by an Eulerian path approach.
نویسندگان
چکیده
We describe an Eulerian path approach to the DNA fragment assembly that was originated by Idury and Waterman 1995, and then advanced by Pevzner et al. 2001b. This combinatorial approach bypasses the traditional “overlap-layout-consensus” approach and successfully resolved some of the troublesome repeats in practical assembly projects. The assembly results by the Eulerian path approach are accurate, and its computation is significantly more efficient than other assembly programs. As an extension, we use the Eulerian path idea to address the multiple sequence alignment problem. In particular, we have as a goal aligning thousands of sequences simultaneously, which is computationally exorbitant for all existing alignment algorithms. As a beginning, we focus on DNA sequence alignment. Our method can align hundreds of DNA sequences within minutes with high accuracy, and its computational time is linear to the number of sequences. We demonstrate its performance by alignments of simulated sequences and by an application in a resequencing project of Arabidopsis thaliana. Although having some weaknesses including aligning gap-rich regions, the Eulerian path approach is distinguished from other existing algorithms in solving either fragment assembly or multiple alignment
منابع مشابه
An Eulerian Path Approach to Global Multiple Alignment for DNA Sequences
With the rapid increase in the dataset of genome sequences, the multiple sequence alignment problem is increasingly important and frequently involves the alignment of a large number of sequences. Many heuristic algorithms have been proposed to improve the speed of computation and the quality of alignment. We introduce a novel approach that is fundamentally different from all currently available...
متن کاملEulerian Path Methods for Multiple Sequence Alignment
With the rapid increase in the size of genome sequence databases, the multiple sequence alignment problem is increasingly important and often requires the alignment of a large number of sequences. Beginning in 1975, many heuristic algorithms have been created to improve the speed of computation and the quality of alignment. We introduce a novel approach that is fundamentally distinct from all c...
متن کاملAn Application of the ABS LX Algorithm to Multiple Sequence Alignment
We present an application of ABS algorithms for multiple sequence alignment (MSA). The Markov decision process (MDP) based model leads to a linear programming problem (LPP), whose solution is linked to a suggested alignment. The important features of our work include the facility of alignment of multiple sequences simultaneously and no limit for the length of the sequences. Our goal here is to ...
متن کاملAn Eulerian path approach to local multiple alignment for DNA sequences.
Expensive computation in handling a large number of sequences limits the application of local multiple sequence alignment. We present an Eulerian path approach to local multiple alignment for DNA sequences. The computational time and memory usage of this approach is approximately linear to the total size of sequences analyzed; hence, it can handle thousands of sequences or millions of letters s...
متن کاملgpALIGNER: A Fast Algorithm for Global Pairwise Alignment of DNA Sequences
Bioinformatics, through the sequencing of the full genomes for many species, is increasingly relying on efficient global alignment tools exhibiting both high sensitivity and specificity. Many computational algorithms have been applied for solving the sequence alignment problem. Dynamic programming, statistical methods, approximation and heuristic algorithms are the most common methods appli...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Cold Spring Harbor symposia on quantitative biology
دوره 68 شماره
صفحات -
تاریخ انتشار 2003